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Abstract. This paper presents a financial analysis over Twitter senti¬ 
ment analytics extracted from listed retail brands. We investigate whether 
there is statistically-significant information between the Twitter senti¬ 
ment and volume, and stock returns and volatility. Traditional newswires 
are also considered as a proxy for the market sentiment for comparative 
purpose. The results suggest that social media is indeed a valuable source 
in the analysis of the financial dynamics in the retail sector even when 
compared to mainstream news such as the Wall Street Journal and Dow 
Jones Newswires. 
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1 Introduction 

Major news announcements can have a high impact on the financial market and 
investor behaviour resulting in rapid changes or abnormal effects in financial 
portfolios. As human responsiveness is limited, automated news analysis has 
been developed as a fundamental component to algorithmic trading. In this way, 
traders can shorten the time of reaction in response to breaking stories. The basic 
idea behind these news analytics technologies is to predict human behavior and 
automate it, so traders may be able to anticipate asset movements before making 
an investment or risk management decision. 

Twitter data has also become an increasingly important source to describe 
financial dynamics. It provides a fine-grained real-time information channel that 
includes not only major news stories but also minor events that, if properly 
modelled, can provide ex-ante information about the market even before the 
main newswires. Recent developments have reflected this prominent role of social 
media in the financial markets. One major example is the U.S. Securities and 
Exchange Commission report allowing companies to use Twitter to announce 
key information in compliance with Regulation Fair Disclosure [18) . Twitter has 
also shown that can cause fast and drastic impact. In 2013, with the so-called 
Hash Crash, a hacked Twitter account of the American news agency Associated 


Press falsely disclosed a message about an attack on the White House causing a 
drop in the Dow Jones Industrial Average of 145 points in minutes |25) . 

In m we proposed a new model for sentiment classification using Twitter. 
We combined the traditional lexicon approach with a support vector machine al¬ 
gorithm to achieve better predictive performance. In the present work, we use the 
dataset of sentiment analytics from m to investigate the interplay between the 
Twitter sentiment extracted from listed retail brands, and financial stock returns 
and volatility. We verify whether there is statistically-significant information in 
this relationship and also compare it to a corresponding analysis using sentiment 
from traditional newswires. We consider volatility and log-returns as financial 
endogenous variables and we take Twitter sentiment and volume as exogenous 
explanatory variables in the financial dynamics of the selected stocks. We also 
consider traditional newswires as datasource for comparative purpose. There¬ 
fore, the main objectives are: (i) verify whether there is statistically-significant 
information between the Twitter sentiment, and stock returns and volatility and 
(ii) compare this interplay while using mainstream news as a proxy for the mar¬ 
ket sentiment. The main contribution of this work is an empirical evidence that 
supports the use of Twitter as a significant datasource in the context of financial 
markets in the retail industry even when compared to traditional newswires. 

2 Literature Review 

The investigation of the market impact of News has been long studied since 
the seminal work of [5]. In this work, the authors investigate to which extent 
macroeconomic news explain return variance and also analyse the observed mar¬ 
ket moves following major political and world events. More recently, |23j provided 
the first evidence that news media content can predict movements in broad indi¬ 
cators of stock market activity. The authors found correlation between high/low 
pessimism of media and high market trading volume. They further analyse the 
relation between the sentiment of news, earnings and return predictability |24| . 
Since then, with the availability of machine readable news and the use of senti¬ 
ment analysis, several works have found news as a significant source for financial 
applications: ^ found positive correlation between the number of mentions of 
a company in the Financial Times and its stock’s volume; |12j investigate the 
effect of News in the behavior of the traders; [T3] analyse the Thomson Reuters 
News Analytics (TRNA) and hnd a causality between sentiment and, volatility 
and liquidity. 

Recent research supports the hypothesis that Twitter data also has statistically- 
significant information related to hnancial indicators. As one of the first investi¬ 
gations analysing Twitter in the context of financial markets, [2] analyse the text 
content of daily Twitter feeds to identify two types of mood: (i) polarity (positive 
vs. negative) and (ii) emotions (calm, alert, sure, vital, kind, and happy). They 
were able to increase the accuracy in the prediction of the DJIA index. Similar 
work m was able to predict not only the DJIA index but also the NASDAQ-100 
index; the authors measured the agreement of sentiment between messages in 


addition to the market mood. More recently, [28] combined Information The¬ 
ory with sentiment analysis to demonstrate that Twitter sentiment can contain 
statistically-significant ex-ante information on future prices of the S&P500 index 
and also identified a subset of securities in which hourly changes in social media 
sentiment do provide lead-time information. As a contribution to the to the field 
of event study research, [22] offer a methodology to analyse market reactions to 
combinations of different types of news events using Twitter to identify which 
news are more important from the investor perspective. In a similar way, m 
combine sentiment analytics with the identification of Twitter peaks in an event 
study approach to imply directions of market evolution. Further, exploring the 
social network structure from Twitter users, [26j provide empirical evidence of a 
financial community in Twitter in which users’ interests align with the financial 
market. 

Similar to the present work, investigate the causality between polarity 
measures from Twitter and daily return of closing prices. The authors also use 
sentiment derived from a Support Vector Machine model to classify the tweets 
into the positive, negative and neutral categories. As a contribution compared to 
this work we not only investigate the causality in returns but also stock’s volatil¬ 
ity. Also, we provide a comparison between Twitter and traditional newswires. 
Moreover, we concentrate the analysis in retails brands which can provide mean¬ 
ingful insights for applications in that domain. 

Further examples of social media applications in the stock market are: the use 
of StockTwits sentiment and posting volume to predict daily returns, volatility 
and trading volume El; the extraction of features from financial message board 
for stock market predictions |19j and approaches combining Twitter with other 
sources such as blogs and news nziiiniii]. 

3 Dataset 

Our analysis is conducted on a set of five listed retail brands with stocks traded 
in the US equity market, which we monitor during the period from November 
01, 2013 to September 30, 2014. The name of the investigated stocks with re¬ 
spective Reuters Instrument Codes (RIC) follow: ABERCROMBIE & FITCH 
CO. (ANF.N), NIKE INC. (NKE.N), HOME DEPOT INC. (HD.N), MATTEL 
INC. (MAT.N) and GAMESTOP CORP. (GME.N). The choice of companies is 
bounded by the Twitter sentiment analytics dataset provided by m- 

Given the companies selected, we consider three streams of time series data: 
(i) the market data, which is given at the daily stock price; (ii) news meta¬ 
data supplied by m, which consists in 10,949 news stories from Dow Jones 
Newswires, the Wall Street Journal and Barron’s, and (hi) the social media data 
analytics provided by m, which is based on 42,803,225 Twitter messages. 

3.1 News Analytics 

The news analytics data supplied by m are provided in a metadata format 
where each news receives scores quantifying characteristics such as relevance 


and sentiment according to a related individual stock. Table shows a sample 
of the news sentiment analytics data provided. The relevance score {Relevance) 
of the news ranges between 0 and 100 and indicates how strongly related the 
company is to the underlying news story, with higher values indicating greater 
relevance. Usually, a relevance value of at least 90 indicates that the entity is 
referenced in the main title or headline of the news item, while lower values 
indicate references further down the story body. Here we filter the news stories 
with a relevance of 100. This increases the likelihood of the story considered 
being related to the underlying equity. Besides the relevance, we also consider 
the Event Sentiment Score (ESS). This measure indicates short-term positive 
or negative financial or economic impact of the news in the underlying company; 
higher values indicate more positive impact. It ranges between 0 and 100 where 
higher values indicate more positive sentiment while lower values below 50 show 
negative sentiment. 


Table 1: News Sentiment Analytics. Each line represents a news story related 
to a company. The metadata considered consists of the relevance and sentiment 
scores and a timestamp. 


Story Company 

Date Hour 

Relevance Event Sentiment Score (ESS) 

1 

NIKE INC. 

20140104 210130 

33 

64 

2 

MATTEL INC. 

20140105 41357 

100 

50 

3 

NIKE INC. 

20140105 145917 

93 

88 

4 

NIKE INC. 

20140105 150523 

100 

61 

5 

GAMESTOP CORP. 20140105 193507 

44 

50 

6 

GAMESTOP CORP. 20140106 170040 

99 

44 

7 

MATTEL INC. 

20140106 222532 

100 

61 

8 

GAMESTOP CORP. 20140107 32601 

100 

50 

9 

MATTEL INC. 

20140107 172628 

55 

40 

10 

NIKE INC. 

20140110 204027 

100 

67 


Given this metadata information, we first normalize the Event Sentiment 
Score {ESS) of a given story at a timestamp At such as it ranges between -1 
and 1, and we label it as ESS{At) G [—1,1]. Then, we define the sentiment and 
volume analytics for each company as: 

“ GNews{t)'- daily number of positive News, i.e., daily total number of News 
with ESS{At) > 0; 

— Bf^ews{t)' daily number of negative News, i.e., daily total number of News 
with ESS{At) < 0; 

“ VNews{t)- daily total number of News; 





— SA]\[ews(t): daily absolute sentiment from News: 

^ — G ^News{i')} ( 1 ) 

— SRMews(t) € [—1,1]: daily relative sentiment from News as the daily mean 
of sentiment score ESS{At), At S [t,t + !)• 


3.2 Twitter Analytics 

For the Twitter data analytics, we use the dataset from m- It provides senti¬ 
ment and volume metrics related to a company. We use the following analytics: 

“ GTwitter{t)'- daily number of positive English tweets; 

— BTwitter(t)'- daily number of negative English tweets; 

— VTwiuer{t)-. daily total number of messages regardless of the language. 

Tablej^shows an example of the Twitter sentiment analytics for the company 
MATTEL INC. For the polarity classification, m employed a new approach 
based on the combination of existing common used techniques (lexicon-based 
and machine learning based) which outperformed standard benchmarks, see [11) 
for further details. Notice that the number of positive, negative and neutral 
messages do not sum up to the total volume, as the former consider only English 
tweets and the total volume covers the total number of messages regardless of 
the language. Also, although provided, we do not use the number of neutral 
messages as we believe that the extreme polarities (positive and negative) may 
be more informative. 


Table 2: Twitter Sentiment Analytics. Sample of analytics for the company MAT¬ 
TEL INC. It shows the positive, negative and neutral English Twitter messages 
related to the company and also the total number of messages regardless of the 
language. 


Date CompanylD Volume ^Positive ^Negative ^Neutral 


01/11/2013 MATTEL INC. 

1,980 

8 

4 

485 

02/11/2013 MATTEL INC. 

1,750 

12 

2 

339 

03/11/2013 MATTEL INC. 

1,700 

8 

1 

518 

04/11/2013 MATTEL INC. 

2,720 

19 

2 

429 

05/11/2013 MATTEL INC. 

1,980 

11 

8 

793 

06/11/2013 MATTEL INC. 

1,580 

11 

4 

470 

07/11/2013 MATTEL INC. 

1,770 

7 

1 

498 

08/11/2013 MATTEL INC. 

1,900 

5 

4 

288 

09/11/2013 MATTEL INC. 

1,260 

16 

2 

236 

10/11/2013 MATTEL INC. 

1,700 

7 

8 

313 





We hence compute the variables: 


— SATwitterit): daily absolute sentiment from Twitter: 


(^) — Gxivitterii') 


( 2 ) 


— SRTwitter{t) € [—1,1]: daily relative sentiment from Twitter as 


S RTwitterii') 


GTwitterii') RTwitter{i') 
^Twitter^) T ^Twitter i^') 


(3) 


Notice that SRTwitterito) = +1, represents a day to with the highest posi¬ 
tive sentiment for the company considered; conversely SRTwUterito) = —1 in¬ 
dicates the highest negative sentiment, whereas we consider neutrality when 
S^Twitter 0 - 

Although Twitter and News are distinct datasources, noticed that we have 
computed sentiment and volume analytics in such way that we have comparable 
time series between those source^ This allows us make a comparative study 
between them while analysing the financial data further defined. 

Table shows a summary description of the selected companies with the 
number of stories considered. We show the total number of News related to the 
each company and also the number of relevant news, i.e., those in which the news 
story has a Relevance score equals to 100, as explained previously. Moreover, 
we present the total number of tweets related to each company. Notice that the 
Twitter dataset used does not provide any relevance score, therefore there is no 
further filtering process. 


Table 3: Summary table of selected companies. The five retails brands selected 
for the analysis along with their market capitalization. Also, we present the total 
number of news and tweets in the selected period. The relevant news represent 
the news hltered with the highest relevance score (100). 


Company 

RIC 

Market Cap.* ($Billions) Total No. of News Relevant News No. of Tweets 

ABERCROMBIE k FITCH CO. 

ANF.N 

2.86 

1,608 

174 

1.352.643 

NIKE INC. 

NKE.N 

67.39 

2,881 

178 

38,033.900 

HOME DEPOT INC. 

HD.N 

111.57 

3,835 

241 

1,593,204 

MATTEL INC. 

MAT.N 

15.02 

1,508 

125 

613,798 

GAMESTOP CORE. 

GME.N 

6.41 

1,117 

167 

1,209,680 


(*) Market Capitalization as in October 31, 2013. Source: Thomson Reuters Ikon. 


^ We may refer to a time series independently to a specific datasource, in such cases 
we will represent it as its original symbol but without the text subscript, e.g., the 
number of positive will be represented as G{t) when discussing both News GNemsit) 
and Twitter GTwitter{t) in the same context. 







4 Financial Variables 


Let P{t) be the closing price of an asset at day t and R{t) = log P{t)—log P{t — 1) 
its daily log-return. We consider the Excess of Log-returij^ of the asset over the 
return of the market index R as: 

ER{t)=R{t)-R{t). (4) 


We consider the S&P500 daily returns as the market index R. 
As a proxy for the daily volatility of a stock, we define: 


VOL[t) = 2 


Phigh{t) + Plow{t) 


(5) 


where Phigh{t) and Piow{t) are the highest and the lowest price of the stock at 
day t, respectively. 

Fig.g shows a sample of the calculated variables from Twitter for the com¬ 
pany Home-Depot Inc. It is interesting to notice a spike in volume and decrease 
in sentiment at the end of the period which follows a corresponding drop in 
excess of log-return. Further, Fig. [^depicts the distribution of values of the rel¬ 
ative sentiment obtained from Twitter and News. We observe that both present 
a skewed distribution while news has a more neutral-centred distribution com¬ 
pared to Twitter. It is important to notice that the sentiment provided by the 
Twitter analytics presents a distinct proxy for sentiment compared to News as 
each company analysed depicts different positive/negative sentiment tones, e.g., 
the NIKE’s Twitter sentiment is highly positive while the news’ sentiment has 
a mean around a neutral point. 


An alternative approach is to examine the alpha generation as the excess return of 
the underlying stock relative to its benchmark adjusted for a given level of risk as 
in the market model described in [8]. 





Fig. 1: Twitter’s Sample Descriptive Analysis for Home-Depot Inc. Variables: 
Excess of log-return, ER] volatility, VOL; absolute sentiment, SATwitter', num¬ 
ber of positive messages, Orwitter and number of negative messages, BTwUter- 
There is a spike in volume and decrease in sentiment at the end of the period 
which follows a corresponding drop in excess of log-return. 








Fig. 2: Distribution of relative sentiment from Twitter SRTwitterit) and News 
SRNewsit) for the companies: ABERCROMBIE & FITCH CO., GAMESTOP 
CORP., HOME DEPOT INC., MATTEL INC. and NIKE INC. It is clear that 
the sentiment provided by Twitter is a distinct proxy for market sentiment com¬ 
pared to News as each company analysed depicts different distributions of sen¬ 
timent. 



























5 Method 


5.1 Granger Causality 

We are interested in investigating the statistical causality between sentiment 
and the financial variables. In this sense, [3] introduced a concept of cause-effect 
dependence where the cause not only should occur before the effect but also 
should contain unique information about the effect. Therefore, we say that X 
Granger-cause Y if the prediction of Y can be improved using both information 
from X and Y as compared to only utilizing Y. 

In a Vector Auto-Regressive (VAR) framework, we can assess the Granger 
causality performing a F-test to verify the null hypothesis that Y is not Granger- 
caused by X and measure its probability of rejection within a confidence level. 
Hence, assuming the VAR models: 


Yt — o:o + aiYt-i UkYt-k + PkXt-k + et, (6) 

Yt = 7o + + ... + 'JkXt-k + 0kYt-k + iti (7) 

we take the null hypothesis in equation (|^ and test it against its alternative 
one in equation ([^. Thus, a rejection of the null hypothesis implies that Y 
Granger-cause X. 


770 : /3i = /32 = ■ • ■ = /3fc = 0 (8) 

Hi : 3/3^, 0 < r < fc : /3^ yf 0 (9) 

In the same way, the test for X Granger-cause Y can be done considering the 
equation Q and taking the hypotheses from equations ([^ and in an analo¬ 
gous way. 

For both News and Twitter, we will test the Grange-causality between the 
Excess of Log-return ER and the number of positive stories G, the number 
of negative stories B, the relative sentiment SR and the absolute sentiment 

SA. For the volatility VOL, we will consider the total volume of stories V in 

addition to previously mentioned variables. Furthermore, we will perform the 
Granger-causality test over the normally standardized versions of the time series 
analysed such as they have zero mean and standard deviation 1. To perform the 
F-statistics of the Granger-causality tests we use the function anova.lm from the 
package stats of the R Project for Statistical Gomputing [15]. 

For the only purpose of visualization of the Granger-causality results we 
create a Granger-causality graph G = [V, E], where V is a node set, and E is an 
edge set. A node u G V represents a variable in the causality test and an edge 
e = (u, v) indicates that u Granger-causes v within a pre-dehned significance 
level. Further, we define p{e) as an attribute of the edge and if C is the set of 
companies in which we have a causality between u and v, then we set p{e) = C. 
Fig-HI shows an example of a Granger-causality graph that indicates that u 
Granger-causes v for the set of companies C. 


Fig. 3: Granger-causality graph. The variable u Granger-causes the variable v 
for the set of companies C. 


5.2 Predictive Analysis 

To evaluate the predictive power of sentiment we consider two auto-regressive 
models, with and without sentiment, and conduct a 1-step ahead prediction 
analysis: 

k 

■Mo '■ X{t) = a -f- ^ ' PrXit — r) -|- et, (10) 

T—1 

k k 

Ml : X{t) = a + '^j3rX{t - r) + - t) + (11) 

T—1 i—1 


where. 


X{t)€{ER{t),VOL{t)}, (12) 

Y{t)€{G{t),B{t),SR{t)M{t))- (13) 

As the absolute sentiment SA{t) (of both News and Twitter) is already a linear 
combination between positive G{t) and negative stories B{t), we will not consider 
it in the linear regression for any dataset. Moreover, we consider only 1 day lag 
for the sentiment variables and a lag of 2 days for the financial variable^ Again, 
we will consider the normally standardized versions of the time series analysed. 

Hence, we will consider the following regression models for the excess of log- 
return prediction: 

Mo : ER{t) = a + PiER{t - 1) + l32ER{t - 2) + et, (14) 

Mi:ER{t)= a +l3iER{t-l) +l32ER{t-2) (15) 

-I- 7iG(t - 1) -I- 72B(t - 1) + isSRit -1) + et 

For the volatility prediction using news as datasource we will not include the 
volume time series VNewsit) as an explanatory variable in the regression because 
of its high correlation with the number of positive and negative news already 
taken into account in the model. Notice that, for the Twitter case, the volume 
time series consider also non-English messages which are not taken into account 
by the time series in Grwitterit) and Erwitterit). Therefore, we keep Vxwitterit) 

® A model selection approach can be also used in order to find an optimal lag for the 
explanatory variables, examples of selection’s criteria are: the Akaike information 
criterion (AlC), the Bayesian information criterion (BIG) and Mallow’s Cp. See [3]. 




as an explanatory variable in the Twitter model. As a result, we have for news: 


Mo ■■ VOL{t) = a + l3iVOL{t - 1) + P 2 VOL{t - 2) + e*, 
Ml : VOL{t) = a + PiVOL{t - 1) + /32VOL{t - 2) 


(16) 

(17) 


+ yiGjv ews {t-l)+ 'y2BN ews (^ ~ 1) + Ct 


and for Twitter: 


Mo ■■ VOL{t) = a + PiVOL{t - 1) + f32VOL{t - 2) + e*, (18) 

Ml : VOL(t) = a + PiVOL{t - 1) + P2VOL{t - 2) (19) 


(18) 

(19) 


T ^iGxwitteri^ 1) T ^2BTwitter{i 1) T ^‘i^Twitter{S‘ ^) A 


Forecasting accuracy is measured by comparing the two residuals et and ei 
in terms of Residual Standard Error: 



( 20 ) 


where, T is the total number of points, n is the number of degrees of freedom of 
the model, yi are the predicted values and j/i are the observed ones. 

6 Results and Discussion 

Here we present the results from the Granger-causality tests and the predic¬ 
tive analysis between the financial variables and sentiment data from Twitter 
and news. The sentiment predictive power and its Granger-causality tests are 
fulfilled in 1-step ahead fashion. We investigate the statistical significance of 
the sentiment variables in regards to movements in returns and volatility and 
we compare the Twitter results with news. We provide empirical evidence that 
Twitter is moving the market in respect to the excess of log-returns for a subset 
of stocks. Also, Twitter presents a stronger relationship with stock returns com¬ 
pared to news, in the selected retail companies. On the other hand, the Twitter 
sentiment analytics showed a weaker relationship with volatility compared to 
news. 

6.1 Excess of Log-Returns 

We analyse the dynamics of excess of log-returns of the stocks considered in 
relation to absolute and relative sentiments and also with the number of positive 
and negative stories. 

Fig. ID shows the Granger-causality graph that summarises the significant 
Granger-causalities (p-value < 0.05) between the excess of log-return and the 
sentiment variables for both news and Twitter. See Table [5] for the detailed 









results. We observe that the Twitter’s sentiment analytics present more signifi¬ 
cant points compared to news. The Twitter’s relative sentiment and its number 
of positive messages Granger-cause log-returns, respectively, for the companies 
GAMESTOP CORP. and MATTEL INC. The Twitter’s absolute sentiment also 
Granger-causes returns for MATTEL INC. while having a two-way significant 
(p-value < O.OI) Granger-causality for the company HOME DEPOT INC. The 
number of negative stories alone has no significant relationship with returns but 
combined with the number of positive stories in the form of relative and absolute 
sentiment it shows to be an important measure. The News’ analytics have only 
one significant relationship that it is observed in the number of positive news 
Granger-causing the excess of log-returns for the company GAMESTOP CORP. 


(a) TWITTER 



(b) NEWS 



GME.N 



Pig. 4: Granger-causality graph for (a) Twitter and (b) news. It shows the signifi¬ 
cant points in the the Granger-causality test between excess of log-returns {ER) 
and the sentiment analytics: number of positive stories (G), number of nega¬ 
tive stories (B), absolute sentiment {SA) and relative sentiment SR. Sentiment 
variables that presented no significant causality are not shown in the graph. 


The solution of the multiple regression analysis in Table agrees with the 
Granger-causality tests as it shows Twitter with a higher number of significant 
sentiment coefficients compared to news. The company MATTEL INC. partic¬ 
ularly presents all sentiment coefficients with high significance (p-value < 0.01) 
suggesting that the Twitter sentiment analytics is indeed relevant in the pre¬ 
diction of the next day excess of log-return. The companies HOME DEPOT 
INC. and GAMESTOP CORP. also presented significant sentiment coefficients. 
Eor the News analytics, the sentiment was significant only for the company 
GAMESTOP CORP. Further, analysis of the Residual Standard Error of the 
models with and without sentiment variables in Table H] shows that the use of 
the Twitter sentiment variables reduced the error of the model with only market 
data for the companies MATTEL INC., HOME DEPOT INC. and GAMESTOP 







CORP. while the news’ sentiment improved the prediction only for the company 
GAMESTOP CORP. 

In sum, the Twitter analytics surprisingly showed a stronger causality with 
stock’s returns compared to news. It is interesting to notice that we did not 
perform any explicit filtering process in the Twitter analytics. However, we only 
considered the extremes of polarity (positive and negative categories), i.e., we did 
not consider the neutral tweets. This suggests that the sentiment classification 
itself is indirectly filtering the noise in the data in the sense that the non-neutral 
tweets are really informative. Moreover, the increased causality compared to 
news indicates a prominent role of Twitter in the retail industry. We believe 
that Twitter act as a feedback channel for the retail brands in a timely fashion 
hne-grained way compared to News. 


Table 4: Residual standard error improvement in prediction of excess of log- 
return ER{t) using SR{t), G{t) and B{t) compared to the model with only mar¬ 
ket data. 


Error Reduction (%) 


Company 

NEWS 

TWITTER 

NIKE INC. 

-2.41 

-0.58 

ABERCROMBIE & FITCH CO. 

-1.26 

-0.60 

HOME DEPOT INC. 

-0.99 

1.23 

MATTEL INC. 

-0.48 

2.82 

GAMESTOP CORP. 

8.34 

1.10 
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6.2 Volatility 


Here, we analyse the interplay between message’s volume and sentiment with 
stock’s volatility. As volume measures we consider: the number of positive and 
negative English stories and also the total volume of stories, regardless of the 
language. As sentiment analytics we consider: absolute sentiment and relative 
sentiment. 

Fig. m shows the significant links (p-value < 0.05) of the Granger-causality 
test between the volatility and the sentiment variables. See Table for the de¬ 
tailed results. Overall, there are more significant points of causality for the News 
sentiment analytics compared to Twitter. We observe that the number of positive 
stories and the total volume both Granger-cause volatility for Twitter as well 
as for news but more companies are affected by news. The absolute sentiment 
Granger-causes volatility only for news, observed in the company ABERCROM¬ 
BIE & FITCH CO. (ANF.N). The relative sentiment and the number of negative 
stories are not causing volatility, on the other hand volatility is Granger-causing 
negative news for the company GAMESTOP CORP. (GME.N). 


(a) TWITTER 



HD.N 



(b) NEWS 




Fig. 5: Granger-causality graph for (a) Twitter and (b) news. It shows the sig¬ 
nificant points in the Granger-causality test between volatility {VOL) and the 
sentiment analytics: total number of stories (V), number of positive stories (G), 
number of negative stories {B), absolute sentiment (S'A) and relative sentiment 
{SR). Sentiment variables with no significant causality are not shown in the 
graph. 








The solution of the multiple regression analysis in Table shows that the 
number of positive stories is a significant variable for both news and Twitter, 
being more significant for the former. The number of negative stories showed 
no relevance in both regressions. The total volume of Twitter was relevant only 
for the company NKE.N. Moreover, analysis of the Residual Standard Error of 
the models with and without sentiment variables in Table [3 shows that both 
Twitter and News were able to reduce the error in prediction for a subset of the 
companies. In the cases where the model was improved with sentiment. News 
provided a higher reduction of error compared to Twitter. 

Overall, the news analytics showed a higher causality with volatility com¬ 
pared to Twitter. This confirms the predictive power of News with volatility as 
described in the literature [a [Ml E [lo]. Eurther improvements in entity detec¬ 
tion in the sentiment classification algorithm used in the dataset provided m 
may improve the Twitter’s results. 


Table 7: Residual standard error improvement in the prediction of volatility 
VOL(t) using G{t),B{t) and V{t) compared to the model with only market 
data. 


Error Reduction (%) 


Company 

NEWS 

TWITTER 

NIKE INC. 

1.36 

1.08 

ABERCROMBIE & FITCH CO. 

4.03 

-0.52 

HOME DEPOT INC. 

2.46 

1.10 

MATTEL INC. 

-2.21 

-0.36 

GAMESTOP CORP. 

14.99 

0.20 
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7 Conclusion 


We showed that measures of the Twitter sentiment extracted from listed retail 
brands have statistically-significant relationship with stock returns and volatility. 
While analysing the interplay between the excess of log-return and the Twitter 
sentiment variables we conclude that: (i) Twitter presented a stronger Granger- 
causality with stock’s returns compared to news; (ii) positive tweets and Twit¬ 
ter’s sentiment Granger-cause excess of log-returns for a subset of companies; 
(iii) Twitter’s sentiment analytics is indeed relevant in the prediction of the next 
day excess of log-return even compared to traditional newswires. Moreover, in 
the volatility analysis we found that: (i) Twitter’s analytics showed a weaker 
relationship with volatility compared to the one observed with returns; (ii) the 
number of positive tweets and the total volume both Granger-cause volatility for 
some companies but present reduced Granger-causality compared to news; (iii) 
the number of positive tweets is a significant variable for the 1-step ahead pre¬ 
diction of volatility while the number of negative messages showed no relevance. 
Overall, the Twitter sentiment analytics showed to be a distinct and comple¬ 
mentary proxy of market’s sentiment compared to news in the analysis of the 
hnancial dynamics of retail brands’ stocks. Surprisingly, the Twitter’s sentiment 
presented a relatively stronger relationship with the stock’s returns compared 
to traditional newswires. The results suggest that social media analytics have a 
prominent role in the dynamics of the retails sector in the financial markets. 
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